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The machine perception lahorairtfy represents a new paradigm for 
research in Artificial Intelligence at the Computer Science Department of 
UCLA. It is based on synergistic intermixing of methods and knowledge 
from the fields of Artificial Intelligence and Neuroscience. 


-o- The Neuroscience is a source of fundamental concepts about function 
and mechanism of natural vision and perception; it motivates our view of in- 
separability between algorithms and neural substrate. 


-o- The AI explores computational theories of vision and perceptual reason- 
ing by inventing algorithms and implementing them as “connectionist" archi- 
tectures. 


a. Intellectual motivation 

Intellectual motivation! that unify studies of human and machine 
perception, - including vision, touch, proprioception, range and other sensory 
modalities. - derive from assumption that information processing is funda- 
mental for intelligent behavior. Perception, spatial reasoning and learning are 
the attribute* that will differentiate the next generation rooou from present 
day automated manufacturing. The ultimate test far Artificial Intelligence is 
the invention of an autonomous mobile robots, whose ■intelligent" behavior 
emerges from linking perception to motor output Modem computer science 
plays a pivotal role in understanding information processing systems. On the 
other hand, mechanisms and functions of information processing underlying 
human intelligence are in the domain of Neurosciences. The rapid growth of 
these disciplines in recent years is advancing our understanding of percep- 
tion. It is hoped that interdisciplinary combination of Artificial Intelligence 
and Cognitive Science will provide more rigorous, scientific fundaoons for 
this research. 


The underlying intent of this interdisciplinary approach is to transform 
scientific knowledge into an engineering form of a general purpose machine 
perception by viewing "neural" connccuons as a paradigm tor parallel com- 
putations. 

The future of intelligent robots depends on succesfuli implementa- 
tion of a robust perceptual system. Although many clever forms of robotic 
vision have been engineered, a general-purpose machine perception remains 
a distant goal. Computing architectures best suited for global perceptual 
function pose one type of a problem. Another problem stems from the limi- 
tations of sequential computing paradigm where the number of functions 
which naturally map onto Von Neumann architecture is restricted. In natural 
system, visual functions are supported by a variety of parallel structures. This 
motivates our belief that future advances in a general purpose perception 
must assume inseparability of function from strucutre. 

Our prototypical computational architecture consists of hierarchical- 
ly structured layers of processing units that perform dedicated functions. 
Both discrete and real- value passing architectures are considered. Physical 
representation of transduced stimuli is implemented as a well structured con- 
nectivity between "neurons” and the compulations are performed by types 
and weights of different connections. More precisely the computation is a 
result of some process, realized as "neuronal" functions, that is applied to a 
spatio-temporal "image" of signals. The process and the constraints are em- 
bedded into our connectionist architecture. The translation to more abstract 
levels is done through aggregation of features by an interpreter, which in ear- 
ly vision may be implemented oy fixed connections. The ultimate goal of this 
project is to conceptualize a compuung structure which could eventually ue 
implemented in hardware. 
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1. COMPUTERS AND BRAINS - MOTIVATION 


This paper is divided into four sections. Fust we outline the intel- 
lectual needs for integrating the knowledge about perception in man and 
machine. The second section presents our noun of large grain architecture 
as a computational environment for studying global functions of machine 
perception. In the third port we describe the small grain architectures 
represented by "neural networks" that provide a computational substrate for 
perceptual functions. We conclude with architectural models of two early - 
vision operations implemented as neural networks that embody the principle 
of inseparability between structure and function. 


What can be expected from a general theory of perception 
developed by such crossdiscipliiury approach? In the short term it should 
help us understand how the elements of perception have evolved in natural 
systems and what are their limits. In the long run, a theory of perception 
should help us to formulate questions that extend beyond presently limited 
engineering knowledge of thu function. For example, can we improve upon 
biological perception when implementing these functions in mobile robots? 
Is human perception limited by characteristics inherent only to btologicai 
systems? Are these limits imposed by algorithmic principles or by the under- 
lying substrate? What is the gram of computing architecture most suitable for 
cognition and perception? 

b. Perception and AJ 

Our working goal for Machine Perception and in particular for 
Computer Vision is a development of computing systems that can accom- 
plish tasks previously only achieved with human intelligence (1). Discovery 
of heuristics used to constrain the problem according to physical taws should 
eventually lead to models of greater generality (2). In (he past these efforts 
were strongly limited by the computational architectures available to the 
designer. The sequential computing paradigm limits solutions for computer 
vision chat can operate in real time by restricting a selection of functions that 
naturally map onto Von Neuman architecture. In natural systems, visual 
functions are supported by a gamut of physical structures dial are inherently 
masively parallel (3). Hence, we believe that further progress .n realization 
of general purpose computer vision that operates in real time must be based 
on assumption that function and the underlying computational substrate are 
inseparable. The chances of success can be maximized by combining tradi- 
tional. forward-engineering approach to synthesis of computer vision system 
with analytic viewpoint as characterized by Neurosciences where the intent 
is to reverse engineer the solution. This is difficult because the current 
knowledge about anatomy and physiology of neuronal networks underlying 
manipulation of mental imagery does not allow easy introspection on such 
processes at the level of subcognitive computation (4, 5\ Nevertheless, 
mode is of mental computation underlying perception and cognition must be 
build and verified. Approximation of such tests at the present ume. is possi- 
ble only through computational models in the realm of AI (6). Our approach 
to studies of cognitive and perceptual functions is detailed in next section and 
it involves coarse grain architecture represented by networked A] worksta- 
tions. On the other hand, the notion of local computation supported by fine 
gram architectures resembling neural networks is developed in the third 
chapter. 

Perception may be thought of as an example of a continuous prob- 
lem solving operation. It is an acove process during which hypotheses are 
formed about the surrounding environment (see 7). Sensory information ac- 
quired through vtsion, touch, smell, sound and proprioception is integrated to 
evaluate these hypotheses (8). In each of the sensory modality analog data 
must be first acquired and preprocessed. This stage is similar to data driven 
signal processing operations dot are well understood in the realm of Electri- 
cal Engineering. The next stage involves segmentation and labeling of the 
preprocessed sensory data (2,1). And ihe last stage involves understanding of 
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the sensory information in every modality and integration for perceptual rea- 
soning. This representational view of processing derives from generally ac- 
cepted model of visual perception. Considering recent advances m computer 
btted simulations we can implement tn software any model of perception. A 
critical question is whk%^echanisms must be incorporated into hardware to 
guarantee human-like perforMpce. Which architectures would make basic 
perceptual capabilities including learning and problem solving, feasible for 
autonomous mobile robots? The natural computation is based on different 
principles tlum those embodied in computers. U is a task oriented process 
where the current situation, including goals and drives, directly determine the 
neat action (9). The human brain has. many highly developed struc ures, 
dedicated to performing different functions, even though externally it appears 
(o act as general-purpose system. Unravelling mysteries of perception and 
cognition is one of this centiry's major scienuSc challenges. 

c. Neuronal architectures and parallel computation 

The inspiration that Al derives from Neuroscience is based on as- 
sumption that manipulauon of symbolic representations is fundamental to 
emergence of intelligence (2. 10). Hence, computers as symbol manipulating 
systems could allow us to create and test models of perception as computa- 
tional acuvuies of the brain. Since we are the keepers of information about 
this world we can construct the programs and data structures that internally 
to computet represent any concept tlu* normally refers to external environ- 
ment- The simulation running on a computer can perhaps be likened to cog- 
nitive processes that allow to reason about the consequences of physical ac- 
tions before they take place. The central quesuon is whether we could create 
an artificial symbolic system that uses sensory information to construct 
abstract representauons of external world. If Al techniques will allow us to 
realize such symbolic behavior in a computer-based system will it have to be 
based on neural principles (9)? And if so how can we implement symbolic 
processing in terms of neural networks? 

The desireability of neuronal architectures derives from massive 
parallelism (hence, real-time performance) and computation based on con- 
nectivity ihence. simplicity) (11. 12. 13). Parallel computation has recently 
become a major concern for computer science. The constraints of solid state 
physics limn further evolution of sequential machines to increasing speed via 
optical computing. And the developments in VLSI favor parallel architec- 
tures. To gam speed, one school within parallel compuung paradigm as- 
sumes that computation can be performed by a pattern of connections 
between slow and simple processors ( 1 1 . 12, 13). 

Fine grain massively parallel architectures are similar to neuronal 
structures in the sense that they are based on millions of interacting proces- 
sors. One of our immediate research problems is to investigate how can we 
realize such strucutres and how to compute with them. Because of close 
resemblance to anatomy of natural computing structures, this class of archi- 
tectures might offer the most plausible solution to machine perception in real 
ume(12. U,9. 15). 

Past approaches to computer vision were based on the assumption 
that it can be solved in the abstract domain unrelated to the underlying physi- 
cal mechanism (1, 16). Our approach differs because we constrain the prob- 
lem by requiring a solution to be implemen table m a 3-D connectionist archi- 
tecture. The fundamental premise of connecnomsm is that individual neurons 
do not actively manipulate, large amounts of symbolic information (12). 
One of the major modes of mformauon processing in the neural systems can 
be described in terms of the relauve strengths of synaptic connections. 
Therefore, rather than using complex units that manipulate symbolic inputs, 
connecuomst architectures computes by modulating signal with appropriate- 
ly connected simple units. Hence, the computation is a form of 
coopera&ve/compeuuve relaxauon process, taking place in a distributed net 
of "neural' elements. 

Our approach is different from mululayer perceptions because we 
propose that each unit has an S- shaped transfer characteristic (44), which can 
be modeled by: V » Vmax [ X / ( X«k)]. where V is the output, Vmax is the 
saturaung level of the output signal. X is an input and the k is the input value 
that generates the half maximal response. This is consistent with physiologi- 
cal evidence for saturaung membrane response and distributed synapuc in- 
puts. inputs. The sigmoidal funcuon allows for automatic sensiuvity control, 
compulation of relauve values in context of the neighborhood and others. 
Thus unlike the "binary' thresholding function in perceptions, our networks 
will always operate in the most opumai configurauon (17). 

The 'neuronal' operjtors can have thousands of inputs and tens of 
outputs. A ccnunuous output value can be generated as a threshoided hy- 
perbolic tangent funcuon of weighted inputs. Weights allow us to implement 
both posiuve and negative averages. Presynaptic inhibition, dendro- dendritic 
synapses and the concept of relauve changes carrying mformauon completes 
the architectural environment These elements, allow the implementation of 
convergence and divergence of signal pathways as well as lateral interacucms 
between spauaily distinct nodes. Simulation of specific compuung architec- 
tures is supported by L’CLA-PUNNS. a neural network simulator developed 
in my laboratory to address the quesuon of inseparability of funcuon and 
compuung substrate < 1 X). 

Principles of computation behind our simulated model are inspired 
by the neurophysiology of interacung neurons (19): 

- Concurrent computation ls supported by parallel active connections 
between neuronal -operators, arranged in a hierarchy of layers. 


o — Computation is performed in the analog domain and can he simulated 

as real-value passing networks. 

—o— For early processing stages alt tntra and inter layer connections art 
fixed and control is executed by feedback pathways which selectively modu- 
late activity in a single operators. ' 

— o— Adaptive properties of the networks derive from relaxation-like 
behaviour, computed by each layer at the multiple scales of resolution. 

— o — The cooperative and competitive modes of relaxation are computed by 
agonistic and antagonistic lateral interactions between neuronal operators. 
—o— Connections are modeled by weights resembling synapses with sig- 
moidal input-output characteristic. 

— 0 — Abstractions at higher levels are defined by the specific architecture of 
connections. 

— o— Segmentation is partially determined via bottom-up Unking of many 
simultaneous computed images of primitive attributes. 


Our principal architectural module is a three- layer computing struc- 
ture (!». The INPUT layer carries a topologically correct representation of 
the scene. The OUTPUT layer is an abstraction which does not have to be 
spatially indexed to the original image. Local constraints are built into the 
layers. Global and local constraints are computed by the CONTEXT layer. 
The advantage of our concept is that it is general enough to allow the imple- 
mentation of parallel architecture for signal manipulation and for aggregation 
of feature maps in the symbolic domain. 

d. Neural net representation of perceptual knowledge. 

tn Computer Vision systems, programs performing visual functions 
are constrained by the architecture. The robustness of the human perceptual 
system stems from its ability to adapUprogram itself. Thus novel stimut can 
be processed by newly developed compuung structures. Plasticity itself does 
not explain perception, but ability to program new knowledge and to 
search for alternative hypotheses is fundamental to perceptual tasks. A priori 
knowledge of selection criterion will always allow to exhaustively search 
and find an opumai model that satisfies the postulated hypothesis. The ques- 
uon is however, can such solution and its alternatives be identified in a rea- 
sonable time. Hence, the need for massively parallel compulation <n a term 
of neural nets. 

We know that knowledge allows to optimize the search process (1). 
This poses a question of how to organize and represent knowledge in a 
memory so that it can be easily accessed at the right time (20). The factual 
knowledge, as opposed to * how-to" knowledge, can be organized into net- 
works of associations, so that access to one pan provides connections to oth- 
er relevant parts. The knowledge about the scene must include (he specifics 
of visually perceived objects plus the knowledge about a variety of objects us 
all related scenes or functions. This suggests hierarchical, as well as associa- 
tions!. structure. How to realise such architecture with connectionist struc- 
ture. how to map the relevant knowledge onto patterns of connections and 
how to make it program itself by changing connectivity without 'forgetting* 
are some of the questions that we are facing. 

Perceptual knowledge must incorporate world information derived 
from integration of different sensory modalities. “Nihil est in inteilectu quod 
non sit pnus in sensu' (Sl Thomas of Aquinas 1 3c). there is nothing m our 
intellect that did not pass through our senses. Most of our knowledge about 
the environment comes to us through one of the five senses. Hence, under- 
standing the workings of these systems is a prune scientific problem. This 
problem is magnified in the technological realm. Vision is inoispensibie for 
autonomous mobile robots, and there is some progress in this area. Other 
sensory modalities are more neglected, because n is not clear how to best use 
them and how to implement practical solutions. In general, a solution to sen- 
sory interactions with the environment is a precursor to adaptable, intelligent 
performance in for example, industrial settings or in space exploration ( 21 ). 
The problem of best architectures or environment for studying questions re- 
lated to sensory integration is open. The key questions that must be ad- 
dressed are transmodai equivalences, sensory-mode specific knowledge and 
constraints, merging of representations specific to modality, and disambi- 
guating conflicting modal specific information. These problems represent an 
important scientific challenge to implementation of machine perception. 


II. MACHINE PERCEPTION LABORATORY 


The coarse grain architecture of the machine perception environ- 
ment consists of four networked Al workstations, each performing dedicated 
funcuon (fig-1). The vision station simulates the action of the 'EYE' and 
some higher level visual functions. The 'HAND' is a separate station that 
provides the environment for studying manipulation and locomotion in sup- 
port of perceptual task. The Ethernet fulfills the role of the spinal cord by al- 
lowing to integrate other sensory modalities, such as range, proximity, touch, 
etc., controlled by the 'SENSE' workstation. The fourth Al workstations 
simulates higher level cogmuve functions of the ’BRAIN*. The ultimate 
goal of this evolving architecture is to build an environment where by experi- 
menting with global (unctions of machine vision and perception we could 
reduce scientific concepts to engineering solutions. 


Although a complete theory of perception is a distant god. both 
machine intelligence and humans must acquire and manipulate information 
bom the environment. Moreover, this information must be oraantzed into a 
store of knowledge (hat can be applied to future problems. LlSP-bnsed en- 
vironments offer many advantages for experimenting with issues related to 
highly adaptive. muituensory baaed robotic systems. Such integrated en- 
vironments will allow us to approach problems of vision, sensory integration, 
assembly and inspection as general scientific issues of planning, perception, 
problem solving and spatial reasoning. The machine perception laboratory 
(MPL) offers a realistic experimental test-bed for developing and validating 
various hypothesis related to robotic perception and intelligence. It also al- 
lows us to integrate and evaluate different software packages dealing with 
perception and manipulation. 

Some software systems and tools are available currently either in 
academic or industrial environment, that could enhance performance and el- 
iminate the expense of rediscovery. Of conic this is feasible only if them is 
an environment which easily allows integration of existing systems. These 
packages include, among others, systems in vision, planning, decision- 
making. data fusion, reasoning, problem solving etc. The MPL, including all 
hardware/software systems, is in a continuous state of evolution and offers a 
diversified experimental environment spanning fields of computer science, 
artificial intelligence, robotics and cognitive sciences. 

a. System organisation 

The system can be seen as a hierarchical organization of separate 
processes running on different workstations (22). Each dedicated station is a 
complete USP-based environment extended with functions and procedures 
appropriate for experimenting in its domain. Part of the integration issue is 
addressed by extending the total environment with functions that accept, in- 
terpret and execute commands emanating from a dedicated station called 
"BRAIN* and send back results of their domain-specific compulations. In 
this sense the dedicated station behaves as a lower-level entity, capable of 
understanding high-level commands and executing them by triggering spe- 
cialized procedures appropriate to the task. 

Such stations perform multitasking operations in their domains 
while at the same time running under the multitasking environment of the 
"BRAIN*. This will ease the TOP-DOWN integration of the system since 
programming will be limited to writing specialized functions for the brain. 
Message-passing programming, inherent in an advanced A1 environment will 
ease inter-usk communication and the integration of new workstations. At 
the same time, the implementation of the system as a network of independent 
station will preserve their integrity, support parallel execution (vision and 
touch), and perhaps allow for easy integration of software written in other 
languages. 

machine perception laboratory 

ot California. Los Anjtlss 



Figure 1. Global computing environment for the * lachine Perception Labora- 
tory 


This implementation assumes that stations are loosely coupled. It is 
not our intention to study problems of cloae interactions between subsys- 
tems or patterns of data flow and performance in real- time. It is intended 
however, lhat the major portion of computation inherent to a specific domain, 
for example vision, will be performed on d e dicated station called "EYE". 

b. Integration 

One of the key problems in setting up MPL is integration. Issues of 
synchronization, programmability, communication, load balancing, parallel 
execution are all nontrivial problems and could be analyzed by estabilished 
areas of computer science. We envision that MPL will consist of a few. 
networked Al-based workstations and dedicated computers. Because of this, 
a unified LISP environment will aieviate many problems inherent in integra- 
tion of such complex systems. One of the key issues of integration will be to 
combine symbolic ana numeric computation in each sensory modality. An 
example of successful solution to this problem in the area of vision is given 
in (23). 

Our initial research in the perceptual functions is focused on in- 
tegration of vision with other sensory modalities. Hence, the notion of multi- 
ple networked Al-workstations. each dedicated to separate perceptual func- 
tion. The LISP environment provides tools for easy integration of separate 
processes operating on different work stations in the network. Additionally, 
it allows far easier incoropration of software modules written in other 
languages. 

The ‘BRAIN* plays the role of organizing problems at the task lev- 
el and it assumes the responsibility of distributing computing to proper sta- 
tions. Using a LISP environme.it to implement 'BRAIN* facilitates and 
enhances in performance. It is relatively easy to create facilities for pro- 
gramming functions that can request services of remote procedures, gather 
high-level information from different sensory modalities, and interrupt or ac- 
tivate processes such as manipulation running on the other stations. 

Such an environment lends itself to incremental development and 
testing of complex perceptual behavior. Separately developed and tested sen- 
sory or manipulation operations can be integrated as primitive functions in 
the "BRAIN’S* repertiore. Task-level programming, world modeling, and 
manipulation of symbolically represented infroreation is fundamental to im- 
plementation of cognitive functions (24). 


OL UCLA PUNNS: NEURAL NET SIMULATOR 

Previous section presented in example of a coarse grain architec- 
ture. most suitable for studying global functions of perception. In this part we 
focus on environment for studying neural networks as physical substrate 
underlying local compulation in perception. Physical interactions with our 
world demand real-ume responses. If a machine is to maneuver and operate 
in an undercons trained, natural environment, its efficacy and survivability 
will also depend an how quickly it can perceive and respond (25) Natural 
systems solved the problem of real-time constraints by using massively paral- 
lel neural networks. The capabilities of autonomous, mobile robot are res- 
tricted by the size, weight and power requirements of the computer (26). 
The amount of support utat a computer extracts from the machine is one of 
the critical factors in determining the feasibility and functional capabilities of 
a system. The progress m this area may come from conceptually new archi- 
tectures based on neuronal principles. Hence, the need for powerful simula- 
tion tools. 

Despue numerous studies over the last fifty yean, we don't have a 
s atisf actory explanation of perceptual phenomena. Pan of the problem stems 
from inability to describe the process. Von Neumann spe cu l ated that the 
structure and the state of the neural network might be the simplest way to 
describe perception (27). Our approach to machine perception is based on 
assumption th-t the network structure yields the function and. vice versa, that 
the reai-ume function of perception implies a particular neural network struc- 
ture. This approach is motivated by the reductionist view of neurophysiolo- 
gy where the principal notion is to explain function in terms of structure (19). 

To investigate the relationship between structure and function, we 
have developed PUNNS (Perception Using Neural Network Simulation). 
PUNNS (59) is a conunuously evolving environment that allows to study the 
funcuonality of massively parallel computational structures as applied to im- 
age data, the initial focus is to study neural structures that allow execution 
of visual functions in constant time, regardless of the size and complexity of 
the unage. Because of complexity and cost of building a neural net machine, 
a flexible neural net simulator is needed to invent, study and understand the 
behavior of complex vision algorithms. Some of the issues involved in build- 
ing a simulator are how to compactly describe the mterconnecuvity of the 
neural network, how to input image data, how to program the neural net- 
work. and how to display the results of the network 

a. Neural simulators 

The theoretical properties of pseudo neural networks as applied to 
logical computation, learning and adaptation have been extensively explored 
and reviewed elsewhere (27. 28. 29, 30. 31). Many of these approaches have 
nothing in common with neurophysiology . Nevertheless, they do indicate the 
diversity of behavior that results from die interconnection of simple compu- 
tational ele ment s PABLO is an example of a simulator that provides precise 
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Figure 2. Block diagram of PUNNS run-time environment 

modeling of neurons and their interactions (32). Its environment closely sup- 
pens many known properties of soma membrane, synaptic physiology, den- 
dritic propagation, rind axonal transmission. BOSS is another discrete-event 
simulator that was designed to investigate large neural networks (33). In 
contrast to PABLO, where each individual neuron was specified and inter- 
connected, BOSS forms a staustkal representation of the connectivity pat- 
tern. This allows for the relatively fast simulation of targe connectivity pat- 
tons. 

In contrast to these batch type simulators, I SC ON offers the advan- 
tages of an interpreted simulator and network construction tool (34). It is 
written m LISP and it allows to dynamically change network connectivity 
and restart the simulation. The penalty for this flexibility is that large net- 
works take prohibiuvety long to execute. To increase execution speed while 
maintaining flexibility. 1SCON evolved into the Rochester Connecnonist 
Simulator (35). RCS is a nin-ume environment wnUen in C that allows user 
written programs to access a library of conncctiomst type functions, e g. 
building networks, setting potentials, examining nodes. 

b. PUNNS environment 

The run-time environment of PUNNS is fast and robust (fig. 2). 
PUNNS was implemented in C under System V and has been been ported to 
4.3bsd. The underlying simulation approach used was a discrete ume simu- 
lation technique that has each node visited at each simulation ume step. This 
approach is especially useful when input data is changing every few ume 
steps. A connectivity language (eXeL) was developed that describes the 
functionality of individual nodes and how they are interconnected. Complex 
connectivity patterns using large numbers of nodes can be generated by 
eXeL pre-processor rouunes. These are programs that output eXeL files. 
Hence, they are easy to modify when the connectivity pattern must be adjust- 
ed. After loading the eXeL file into PUNNS, the parser builds a data- 
scructure which can be quickly interpreted to produce the simulation of the 
neural network. Changing node functions or connectivity is accomplished by 
reloading a modified eXeL file. Input arai output to the simulation is done 
through graphics windows. Real images are used as a test data for the syn- 
thesized networks. A node's funcuon can access a particular range of pixels 
from a graphics window and can display the result of a node, after firing, in 
an output window. Stimulus and response of a net can be displayed by using 
multiple windows. Activity levels in a layer can be viewed in one window, 
and (he window can be saved as an image. This snapshot of activity can be 
then placed in an input window and newly loaded layers can continue pro- 
cessing from it 

In PUNNS, local connections and global mappings are used to 
separate the ideas of neighborhood node interactions and the connections es- 
tablished between functionally different blocks of nodes. Local connections 
are responsible for recepuve field size and property, while global mappings 
may or may not be topologically preserving. A node’s function tells what a 
node computes from us inputs and us temporal properties describe how the 
excitation level changes over ume. The node is the lowest level pnmiave 
that represents an idealized, lumped parameter model of a neuron. Node 
description specifies inputs from other nodes input and the funcuons which 
are to act on these inputs. PUNNS also allows for dendritic input to a node, 
with each dendrite having a possibly unique processing function. All nodes 
are specified in eXeL files as follows (italics indicate a user definable param- 
eter): 

node node-name: 

initial-value. length-of-history \ 
node -function: 

dendrite- 1 .dendnte-functionl , node-name/6 , ... ; 
dendnte-2.dendrue-function2. node- name 38 . ... ; 
soma. node-name23 node-name 42. 

The initial- -alue of the node allows a selection of different initial value. The 
history-length of a node indicates how many past excuanon levels should be 
saved which is useful in modeling exponential decay. The node-function is 
implemented in C. it exists in the PUNNSrun-ume environment and is fired 
when executed by the simulator. The dendnte-function performs the same 
purpose as the node-function. There is no provision for modeling a delay in 


dendritic propagation outside of synaptic transmission. The dynamic 
behavior of neural networks can be moduli led with a tune-delay option that 
is synonymous with multiple synapuc delays. 

PUNNS has been used to model and simulate pre-attemive 'exture 
segmentation (36) and the generation of matching heuristics from time- 
varying images (18). Figure i illustrates how a conceptual structure, in this 
case a center-surround receptive field, is analyzed using PUNNS. The struc- 
ture of this receptive field, forms a strongly excitatory center and a concen- 
tric inhibitory surround. When multiple, overlapping center-surround recep- 
tive fields are applied to an image (fig. 3a.). the result is a pattern of activity 
that highlights discontinuities in image intensities (fig. 3b). As the transition 
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Figure 3. Example of the input image applied to the PUNNS simulating a 
layer of nodes with ccntcr-surround antagonistic recepuve fields (a). The ac- 
uviucs of these nodes in response to such stimulus arc shown in (b) 
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in intensity becomes stronger, the node’s excitation level increases. This 
structure was easily prototyped in the PUNNS environment and the simula- 
tion lime was under thirty seconds. 


IV. APPLICATIONS: VISION THROUGH CONNECTIONS 

In this section we present examples of two early vision functions 
which have been implemented and analyzed using principles of neural net- 
works. 

1. Constancy preprocessor 

The success of autonomous mobile robots depends on the ability to 
understand continuously changing scenery. Present techniques for analysis of 
images are not always suitable because in sequential paradigm, computation 
of visual functions based on absolute values of stimuli is inefficient. Impor- 
tant aspects of visual information are encoded in discontinuities of intensity, 
hence a representation in terms of relative values seems advantageous (2 3). 
This example deals with the computing architecture of a massively parallel 
vision module that optimizes the detection of relative intensity changes in 
space and time. 

Visual information must remain constant despite the variation in the 
ambient light level or in the velocity of a target or a robot. Constancy can be 
achieved by normalizing motion and lightness scales. In both cases, basic 
computation involves a comparison of the center pixels with the context of 
surrounding values. Therefore, a similar computing architecture, composed 
of three functionally-different and hierarchically-arranged layers of overlap- 
ping operators, can be used for two integrated pans of the module. The first 
pan maintains high sensitivity to spatial changes by reducing noise and nor- 
malizing the lightness scale. The result is used by the second pan to maintain 
high sensitivity to temporal discontinuities and to compute relative motion 
information. Conceptually, the constraints and the rules of transformation are 
embedded into a computing structure which transforms the original image into 
two new representations. One carries the information about discontinuities in 
space while the other represents intensity changes in the time domain. This 
is consistent with the notion of space-time equivalence which suggests a 
hierarchical design where spatial normalization is performed before dealing 
with temporal domain. 

Simulation results show that response of the module is proportional 
to contrast of the stimulus and remains constant over the whole domain of in- 
tensity. It is also proportional to velocity of motion limited to any small por- 
tion of the visual field. Uniform motion throughout the visual field results ut 
constant response, independent of velocity. Spatial and temporal intensity 
changes are enhanced because computationally, the module resembles the 
behavior of a DOG function. 

la. Spatio-temporal considerations 

Natural illumination can vary by ten logarithmic units of intensity. 
This exceeds the response range of artificial or biological sensors (3. 40). 
Hence, the first problem is how to maintain constant sensitivity to light 
changes over the whole intensity domain while preserving a "unique map- 
ping" between the reflectance properties of the surfaces and perceptual no- 
tion of lightness. Linear variations of intensity usually are a consequence of 
non uniform illumination (38) that can be filtered out without loosing mean- 
ingful information. The new representation of the image is expressed as re- 
lative values of intensities, that corresponds to spatial discontinuities generat- 
ed by object boundaries. Absence of a DC component introduces a need for 
some reference point necessary to achieve lightness constancy. 


Lightness constancy can be viewed as a problem of maintaining 
high sensitivity re cardless of local or global ambient light level (39). This 
implies constant response when the illumination throughout the scene is mul- 
tiplied by a constant. In addition, essential information such as edges must be 
preserved. One solution is to have sensors with a steep intensity-response (I- 
R) characteristic, spanning 3 log units of intensity and a mechanism -that au- 
tomatically shifts the operating curve to the prevailing ambient light level 
(40). 

Nearby areas of a scene tend to have approximately equal illumina- 
tion and reflectance. Hence, we use local intensity averages to set the upper 
and lower thresholds of the response curves. This is done automatically by 
adjusting the midpoints of the I-R characteristics to the local ambient light 
levels (40). Thereby invariance under local addition of linear illumination 
bias is achieved. Similar argument holds for global averages which in addi- 
tion reduce sensitivity to noise by removing bias due to overall average il- 
lumination. 

The detailed description of normalization is given in (17). There- 
fore briefly, this operation is performed by spatial operators with two anta- 
gonistic zones, center spot and surrounding annulus, better known as 
center/surround receptive fields (C/S-RF) (3. 41). The CIS uses lateral inhi- 
bition to emphasize contrast or relative value as novelty (42). This normal- 
izes center signal against the spatial context information derived from the 
surround. Such function is equivalent to a comparison of spatially distinct 
areas of the image. The principle of antagonistic receptive nclds is applied 
to all operators working on the image. 

A conceptually similar problem arises in the temporal domain. Most 
of the objects in the real world are rigid and move with constant velocity 
(43). Information about them is contained in temporal discontinuities, which 
must be detectable regardless of ambient motion levels. Again, the limited 
response range of each operator necessitates continuous adjustment of 
operating characteristics to ambient local velocity. Hence, the system must 
normalize the temporal scale by resetting thresholds computed from relative, 
rather than absolute, values. Temporal information can be derived by com- 
paring activities of two CIS operators of opposite polarity (3, i7). The time 
difference in the response waveshape of the two operators will produce a 
transient that carries the information about the onset/offset of change. This 
transient resembles a time derivative of intensity and is used to normalize the 
temporal scale. 

It is dear from these spatial-temporal considerations that our visual 
system must fim normalize intensity changes in space and time. And the way 
to subtract the DC component is to use antagonistic receptive fields imple- 
mented by lateral inhibition . The net result is a double representation of the 
image; one carrying spatial, and the other temporal, information. Both of 
them resemble the effect of convolving the original visual information with a 
center-surround filter resembling difference of Gauss ians (DOG) (2. 3). 

lb. Structural details 

Our computing architecture for normalization of the lightness scale 
was inspired by natural vision systems (41). Major structural components in- 
clude lateral interactions between neighboring dements within a layer, and 
converging and diverging pathways between the layers (fig. 4). Overlap 
between operators helps to enneh representation of the contrast information 
across the boundaries between different receptors. For the sake of simplicity 
local structures remain constant across the module. 




Figure 4. Flow of information in the generalized, normalization module (a). 
Center-surround antagonism of an output operator. Seven targe context 
operators determine the surround response and seven smaller input operators 
determine the center response (b). The output operator compares the two 
responses. 
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The input to the spatial module is analogous to a layer of cone pho- 
la t ctouxt arranged in hexagonal array. The output operators, fu n ctio n al l y 
reacsibe btpoU* cell* found ta the vertebrate retina. Their retponses are dot- 
matized by subtracting a local avenge computed by context operators. Then 
are two types of output operators which differ in polarity and time to peak of 
response. The context information is always of opposite polarity to the center 
signal. R ep r esen ting all relative values in the form of two opposite polarity 
masks improves stability of spatial-temporal interpolation and provides 
phase- tike information about the original signal (45). Also, poaidve ami 
negative operators differ m their coverage of the visual field and hence in 
«pari«i information. Therefore, if both positive and negative operators 
display a zero-crossing or a peak, it is mote likely that the phenomenon ta not 
an artifact created by noise, but a sign of significant discontinuity in intensi- 


ty- 

The campmiaoo performed by output operators combines two levels 
of resolution in the sense that the large RP operators set the thresholds for the 
smalt ones m their area of activity. Other approaches are possible (46. 47). 
but for our initial implementation, we selected the simplest solution. The 
result computed at the output is then roughly the avenged second difference 
of the input intensities. Our method cf combining the different levels of 
resolution is of particular interest because it was implemented using a lanple, 
universal architecture based upon lateral inhibition. To amplify the simula- 
tion, we assume that the photoreceptors converging onto a given surround or 
output layer operator are linearly combined and that inhibition is a simple 
linear operation. 


Fig. 5. shows combined architecture of both modules. Modularity 
and parallelism simplifies signal processing without any ad-hoc assumptions 
about image statistics. The temporal module also consists of three, function- 
ally distinct, two-dimensional layers of C/S operators, arranged in a regular 
hexagonal lattice. The centers cf the RF‘s overlap, aid their sizes are 
different in distinct layers. To facilitate simulation, we chose to model only a 
small part of the visual field; hence we may assume that the sizes of the RPs 
remain constant throughout each layer. 
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Figure 5. Hierarchical architecture of integrated spatial and temporal 
modules. 


The input to the temporal module are two signals (1+ and I-) gen- 
erated by the spatial module. They are of opposite polarity, jfisptay 
differences in their temporal behavior, and are regularly interspaced . Half of 
the temporal input operators receive 1+ and the rest I-. A spatial dracontmwty 
appearing at time t will generate a maximal response to 1+ at tl and l- at u 
with tl and t2 not equal. This difference carries the information about the on- 
set of temporal changes. 


The lime derivative is computed by an input operator which com- 
pares the information about the present input signal with values in tte recent 
oast The source of the information about past values is feedback from the 
context operators. The feedback from global and local temporal context 
operators does not interfere with a signal normalized in space by the first 
submodule. This is similar to the action of local synaptic effect ta amaenne 
cells. Context operators act to predict the future transient rwponseto mo- 
tion. The normalization of the temporal scale is achieved by shifting the 
velocity-response curve of the output operator over the domain of target 


velocities. Thia is based on comparison of motion In spatially separated 
areas of the visual field. Conceptually. DOO of dl/dt computes temporal in- 
formation which appetfs at a transient However, with tapid mooon when 
input intensity in the center of RF fluctuates sharply, large po^ arnd huge 
negative derivatives could cancel during the compi«afi» of the Oauadan. 
Hence, the need for the DOG of the absolute value of the derivative (48). 
The context layer operators, which compute the feedback, rectify in the sense 
that negative signals are attenuated. In biological systems, the amaenne and 
gwgUon cells rectify in order to facilitate CrMuency coding used in the 
transmission of signals over long distances (3). Tnis rectification is function- 
ailv similar to taking absolute values in our module. 


lc. Simulation results 

Fig. 6. is a simple demonstration of the lightness constancy func- 
tion. The output signal foom the module does not change significantly ai i the 
uniform background intensity is varied. Here input, represented by the hor- 
izontal axis, is intensity on a logarithmic scale with 0 log units being 
equivalent to darkness. The vertical axis represents the logarithm of the 
difference between the extreme responses on the light and dark sides of the 
discontinuities. 
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Figure 6. Response of the spatial module to increasing contrast at various 
levels of ambient light intensity. 


The behavior of our module in response to a moving discontinuity 
of intensity is shown in Fig. 7. where the vertical axis represents the maximal 
response of an output operator in the center of the visual field. The horizon- 
tal axis represents velocity in interoperator units per iteration. One intero- 
perator unit is the distance between two neighboring input operators. One 
iteration is the amount of time it takes for a signal to go bom an input opera- 
tor to the context layer and back to the same input operator via the immediate 
feedback. In all cases the input signal is a sharp discontinuity of intensity. 
The part of the discontinuity in the center of the visual field is moving at one 
velocity and the rest of it is moving at a possibly different velocity. These 
are called local velocity (vf) and global velocity (vg) respectively. Fig. 7. 
shows that if velocity is constant throughout the visual field, the response is 
small and almost independent of vi. However, if motion is restricted to a 
small part of the visual field (i.e. vg » 0) , a roughly linear response is ob- 
flViwt This illustrates the fact that our module detects relative rather than 
absolute motion. 

2. A neural act to extract modon heuristics. 

The internal representation of the world that is used by a visually 
guided robot must be updated and maintained using the sensory data derived 
from the environment. Establishing a correspondence between the viewer- 
centered sensor dva and an object-centered internal representation is an ex- 
pensive computational task (49). Therefore, a roving robot must either sit for 
a while and contemplate it’s new position, or move under assumptions which 
are a few steps behind the real world (50). Typically, the correspondence 
process forms an initial match between a perceived object and its internal 
model and then, as the object moves with respect to the roving robot, die 
orientation of th-' model may need to be updated to reflect current sensor in- 
formation 1.31). This paper demonstrates how a connectionist architecture 
can speedup the matching of an internal 3-D model to changing edge 
features, by precomputing future positions of the edge features and providing 
the matcher with neuristic information describing in which direction to start 
manipulating the model. 
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Local valoclty 


Figure 7. Motion throughout the visual field produces a small response 
(Vg-VI) Motion in a small region induces a large roughly Imear response 
(Vglob-0). 

The recognition of an object must involve the matching of some in- 
put ds ta to an internal representation of an object. The matching can be ac- 
complished by either 1) manipulating the data and comparing it to a set of 
fixed models or, 2) transforming the model to match the captured edge 
feature*. As an object in a scene moves, the 2-D projection of its boundaries 
and key features appear to undergo translation, rotation, and occlusion. This 
suggests that the second method is more natural because we do not need to 
compute the position of occluded edges. Also, the second method is more 
suitable for a goal-driven system (52). The constantly updated model be- 
comes a representation of the world that can support scene interpretauon, 
planning, and other higher-level cognitive functions. Mampulaung the 
model requires the matcher to rotate and translate its internal model in an at- 
tempt to match the current edge features. In this approach the internal model 
is continuously trying to catch up to the real world. A speed-up would occur 
if the matcher received, along with the incoming data, a preliminary guess of 
which way the features were rotating or translating. 


3-D Models 



Figure 8. Overall conncctionist architecture of a network used to extract mo- 
tion heuristics. 


Most existing matchers are based on graph theoretic algorithms 
which execute in exponential time with respect to complexity of the graph 
description (53). The matcher establishes a correspondence between the 
internal model representation and the edge features of an image. In this pa- 
per, we assume that this part of the matcher is given. We are concentrating 
on the problem of how the matcher can maintain the established correspon- 
dence as an object is undergoing smooth or discontinuous motion. 

To maintain the c o rr es pondence, the matcher could precompute 
numerous, new orientations of the internal model and have them ready for in- 
coming But this precomputation technique would be time consuming 
and unwieldy, since it substantially increases the graph size. Incoming data, 
though, can be used to give specific suggestions on now the matcher should 
manipulate a model A technique for precomputing possible future positions 
of (he edge features is the first step in formulating a model manipulation 
heuristic for the matcher. 

By using a conncctionist architecture (9), we hope to understand 
how visual functions can be derived from massively parallel computing 
structures. Additionally, neurophysiological evidence can be used to inspire 
possible interconnectivity solutions (17, 54,55). Our mechanism for precom- 
putation is partially motivated by the structure of the early visual cortex 
which has been extensively reviewed elsewhere (56). This region of the cor- 
tex is composed of vertical slabs which contain neurons sensitive to contrast 
edges, of a preset orientation, that are in particular region of the visual field 
(56. 57). Within each slab, there is also a convergence of information related 
to color and motion. 


We have limited our implementation of vertical slabs to the simula- 
tion of their edge orientation information. In a fashion similar to the visual 
cortex, edge detectors of differing orientations over the same spatial sub- 
region are grouped together and locally interconnected. Such a group of 
onented edge detectors are called a column. A column contains all of the 
available orientation information for its particular sub-region of the image. 
In the future, we hope to more realistically model the robustness of the verti- 
cal slabs in the visual cortex. 



3-D Matcher 

Figure 9. Propagation nodes are interconnected to propagate the direction- 
specific activities of edge detectors (a). A computation node requires simul- 
taneous acuvities in both, its propagation node and its edge detector, before it 
will signal the response. 
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Fig. 8 outlines the computations! hierarchy of the architecture. The 
light from a scene is initially transduced into electrical signals by a layer of 
photosensors. These signals are then processed by spot detectors which are 
sensitive to local changes in image intensity. The center receptive fields of 
the spot detectors overlap each other by thirty percent The output from the 
spot detectors are grouped to form oriented edge detectors which are then or- 
ganized into columns. It should be noted that this implementation deliberate- 
ly differs from the known neurophysiological data because of the limitations 
of our simulation tools. Surrounding each edge detector are propagation 
nodes which compute where the edge may move in the future by exciting the 
propagation nodes in adjacent columns. Oriented edge information is used 
by both the precomputation layer and the matcher. The precomputation layer 
gives the matcher heuristic information on the direction of a moving edge 
feature. Computation nodes which are in this layer are able to guess at the 
direction of an edge by comparing the excitation levels of oriented edge 
operators and the surrounding propagation nodes. 

The lowest layer of the architecture extracts changes in image inten- 
sity by using center-surround receptive fields has been detailed in (18). 
Briefly, the image is first filtered by a layer of nodes with center/surround an- 
tagonistic receptive fields. To reduce (he simulation complexity, this layer 
was modeled using a convolution operator. 

The analyses of the information available from motion makes it ap- 
parent that there are only few possible directions that edge feature could take 
without violating the heuristics used for matching points in separate images 
(58). Considering only rigid physical objects with limited velocity, the mo- 
tion is limited to a few possible next-frame positions and directions. Hence 
in principle it is possible to simultaneously tell the matcher where the edges 
are and how they are moving. 

To accomplish this objective, we organize the oriented edge detec- 
tors within a sub-region of the image into a column and then bring the 
columns together to form a cube. A transverse slice of the cube contains all 
of the edge detectors of a particular orientation over the entire image. When 
an edge becomes active, indicating that the current image has an edge feature 
at that location and orientation, we want to use that fact to prepare for future 
movement of that edge feature. 

A moving edge feature can at most activate one of six. nearest- 
neighbor edge detectors in our hypercoiumn. To monitor this change, each 
oriented edge detector in a column is connected to six propagation nodes (p- 
nodes), four translational and two rotational. Thus, a specific p-node wilt 
transmit the activity of its edge detector in one of the six possible directions. 
By propagating the excitation of an edge detector, the p- nodes prime the net- 
work for specific, future orientations of an edge feature (fig. 9). 



A computation node (c-node) combines the information from an 
oriented edge detector and its associated p-nodes. A c-node will only fire 
when us edge detector and one p-node are high. Of course prior to arrival of 
the edge feature, high activity of one p-node implies potential direction of 
motion that can be signaled to the matcher. 

2a. Example 

Fig. 10 illustrates the changing excitation levels of the p-nodes and 
c -nodes over time. In this example, a bar is moving from left to right across 
the visual field (Fig. 10a) Fig. 10b demonstrates how the excitation levels of 
edge detectors are being propagated, in a rightward direction, by the +y 
translational p-nodes. When both the p-nodes and the edge detectors are ex- 
cued. the c-nodes will momentarily fire (fig. 10c) and provide heuristic infor- 
mation to the matcher. 


The precomputation layer of our connectionist architecture can pro- 
vide heunstic information useful in matching 3-D models to time-varying 
edge features. If the velocity of an edge feature should exceed the propaga- 
tion rate of the p-nodes, then the c-nodes will not be excited and the matcher 
will not receive any heuristic information. The matcher could interpret such 
an edge as being part of either, a new object in the scene or, an object that is 


undergoing discontinuous jumps. . 

CONCLUSION 

New approaches to machine sensing and perception were presented. 
The motivation for crossdisciplinary studies of perception in terms of A1 and 
Neurosciences is suggested. The question of computing architecture granu- 
larity as related to global/locaJ computation underlying perceptual function is 
considered and examples of two environments are given. Finally, the exam- 
ples of using one of the environments, UCLA PUNNS, to study neural archi- 
tectures for visual function are presented. 


C*v«t 0 t Excitation 



Figure 10. The stimulus is a time-varying image of the vertical bar is moving 
from left to right (a). The P-nodes propagate the exponentially decaying sig- 
nal about vertically oriented edge moving in the +Y direction (b). When the 
bar moves to the right, the C-node becomes active and sends the information 
to the matcher that this edge feature has undergone left to right translation 
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